Towards unified secure on- and off-line analytics at scale

نویسندگان

  • Peter Coetzee
  • Matthew Leeke
  • Stephen A. Jarvis
چکیده

Data scientists have applied various analytic models and techniques to address the oft-cited problems of large volume, high velocity data rates and diversity in semantics. Such approaches have traditionally employed analytic techniques in a streaming or batch processing paradigm. This paper presents CRUCIBLE, a first-in-class framework for the analysis of large-scale datasets that exploits both streaming and batch paradigms in a unified manner. The CRUCIBLE framework includes a domain specific language for describing analyses as a set of communicating sequential processes, a common runtime model for analytic execution in multiple streamed and batch environments, and an approach to automating the management of cell-level security labelling that is applied uniformly across runtimes. This paper shows the applicability of CRUCIBLE to a variety of state-of-the-art analytic environments, and compares a range of runtime models for their scalability and performance against a series of native implementations. The work demonstrates the significant impact of runtime model selection, including improvements of between 2.3 and 480 between runtime models, with an average performance gap of just 14 between CRUCIBLE and a suite of equivalent native implementations. 2014 Elsevier B.V. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Big Data Analytics in Power Distribution Network

Smart grid enhances optimization in generation, distribution and consumption of the electricity by integrating information and communication technologies into the grid. Today, utilities are moving towards smart grid applications, most common one being deployment of smart meters in advanced metering infrastructure, and the first technical challenge they face is the huge volume of data generated ...

متن کامل

Predicting the Impact of Climate Change on U.S. Power Grids and Its Wider Implications on National Security

We discuss our technosocial analytics research and development on predicting and assessing the impact of climate change on U.S. power-grids and the wider implications for national security. The ongoing efforts extend cutting-edge modeling theories derived from climate, energy, social sciences, and national security domains to form a unified system coupled with an interactive visual interface fo...

متن کامل

Towards Longitudinal Data Analytics in Parkinson’s Disease

The CloudUPDRS app has been developed as a Class I medical device to assess the severity of motor symptoms for Parkinson’s Disease using a fully automated data capture and signal analysis process based on the standard Unified Parkinson’s Disease Rating Scale. In this paper we report on the design and development of the signal processing and longitudinal data analytics microservices developed to...

متن کامل

Watershed Reanalysis Towards a National Cyberinfrastructure for Model-Data Integration

Reanalysis or retrospective analysis is the process of re-analyzing and assimilating climate and weather observations with the current modeling context. Reanalysis is an objective, quantitative method of synthesizing all sources of information (historical and real-time observations) within a unified framework. In this context, we propose a prototype for automated and virtualized web services so...

متن کامل

The Politics and Analytics of Health Policy

Let us start with an example of health policy analysis in action. Within that category of countries loosely known as ‘the West’, quite basic differences exist in attitudes to health policy and also actual health policy. Comparing the US with mainland Europe and indeed Canada, for example, one perceives a difference in attitude on the part of the majority towards collectivism and individualism i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Parallel Computing

دوره 40  شماره 

صفحات  -

تاریخ انتشار 2014